Užkrauname reikalingas bibliotekas

library(tidyverse)
library(knitr)

Generuojant ataskaitą galima failo neskaityti kiekvieną kartą iš naujo - cache=TRUE. Nenorint klaidų/informacinių pranešimų pridedame message=FALSE ir warning=FALSE.

df <- read_csv("../../../project/1-data/1-sample_data.csv")

Duomenų failo dimensijos:

dim(df)
## [1] 1000000       9

Kintamųjų apžvalga

(dėl gražesnio spaudinimo, naudojame funkciją kable() ir išdaliname kintamuosius į kelias eilutes)

summary(df)
##        id                y       amount_current_loan     term          
##  Min.   :      1   Min.   :0.0   Min.   : 10802      Length:1000000    
##  1st Qu.: 250001   1st Qu.:0.0   1st Qu.:174394      Class :character  
##  Median : 500000   Median :0.5   Median :269676      Mode  :character  
##  Mean   : 500000   Mean   :0.5   Mean   :316659                        
##  3rd Qu.: 750000   3rd Qu.:1.0   3rd Qu.:435160                        
##  Max.   :1000000   Max.   :1.0   Max.   :789250                        
##                                                                        
##  credit_score       loan_purpose       yearly_income       home_ownership    
##  Length:1000000     Length:1000000     Min.   :    76627   Length:1000000    
##  Class :character   Class :character   1st Qu.:   825797   Class :character  
##  Mode  :character   Mode  :character   Median :  1148550   Mode  :character  
##                                        Mean   :  1344805                     
##                                        3rd Qu.:  1605899                     
##                                        Max.   :165557393                     
##                                        NA's   :219439                        
##   bankruptcies   
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.1192  
##  3rd Qu.:0.0000  
##  Max.   :7.0000  
##  NA's   :1805

Galutinėje ataskaitoje galime neįtraukti R kodo, naudojant echo=FALSE parametrą.

id y amount_current_loan term credit_score loan_purpose yearly_income home_ownership bankruptcies
Min. : 1 Min. :0.0 Min. : 10802 Length:1000000 Length:1000000 Length:1000000 Min. : 76627 Length:1000000 Min. :0.0000
1st Qu.: 250001 1st Qu.:0.0 1st Qu.:174394 Class :character Class :character Class :character 1st Qu.: 825797 Class :character 1st Qu.:0.0000
Median : 500000 Median :0.5 Median :269676 Mode :character Mode :character Mode :character Median : 1148550 Mode :character Median :0.0000
Mean : 500000 Mean :0.5 Mean :316659 NA NA NA Mean : 1344805 NA Mean :0.1192
3rd Qu.: 750000 3rd Qu.:1.0 3rd Qu.:435160 NA NA NA 3rd Qu.: 1605899 NA 3rd Qu.:0.0000
Max. :1000000 Max. :1.0 Max. :789250 NA NA NA Max. :165557393 NA Max. :7.0000
NA NA NA NA NA NA NA’s :219439 NA NA’s :1805

TO DO

Apžvelgti NA reikšmes, y pasiskirstymą, character tipo kintamuosius panagrinėti detaliau.

df$loan_purpose <- as.factor(df$loan_purpose)
df$y <- as.factor(df$y)
summary(df$loan_purpose) %>%
  kable()
x
business_loan 17756
buy_a_car 11855
buy_house 6897
debt_consolidation 785428
educational_expenses 992
home_improvements 57517
major_purchase 3727
medical_bills 11521
moving 1548
other 91481
renewable_energy 109
small_business 3242
take_a_trip 5632
vacation 1166
wedding 1129

Arba:

df %>%
  group_by(loan_purpose) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  kable()
loan_purpose n
debt_consolidation 785428
other 91481
home_improvements 57517
business_loan 17756
buy_a_car 11855
medical_bills 11521
buy_house 6897
take_a_trip 5632
major_purchase 3727
small_business 3242
moving 1548
vacation 1166
wedding 1129
educational_expenses 992
renewable_energy 109

Pasirinkus kintamuosius juos vizualizuokite

df %>%
  group_by(y, loan_purpose) %>%
  summarise(n = n()) %>%
  ggplot(aes(fill=y, y=n, x=loan_purpose)) + 
  geom_bar(position="dodge", stat="identity") + 
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  theme_dark()

Daugiausiai banktotų imant paskolą šiems tikslams:

df %>%
  filter(y == 1) %>%
  group_by(loan_purpose) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  head(10) %>%
  kable()
loan_purpose n
debt_consolidation 391875
other 44888
home_improvements 27274
business_loan 10356
medical_bills 6286
buy_a_car 5810
buy_house 3652
take_a_trip 2870
small_business 2152
major_purchase 2120

Papildomi pasiūlymai interaktyvumui pagerinti

Interaktyvios lentelės su datatable (DT)

library(DT)
df %>%
  group_by(y, loan_purpose) %>%
  summarise(n = n()) %>%
  datatable()

Interaktyvūs grafikai su plotly

library(plotly)
df %>%
  group_by(y, credit_score) %>%
  summarise(n = n()) %>%
  plot_ly(x = ~credit_score, y = ~n, name = ~y, type = "bar")